A scalable Helmholtz solver in GRAPES over large-scale multicore cluster
نویسندگان
چکیده
This paper discusses performance optimization on the dynamical core of global numerical weather prediction model in Global/Regional Assimilation and Prediction System (GRAPES). GRAPES is a new generation of numerical weather prediction system developed and currently used by Chinese Meteorology Administration. The computational performance of the dynamical core in GRAPES relies on the efficient solution of threedimensional Helmholtz equations, which lead to large-scale and sparse linear systems formulated by the discretization in space and time. We choose generalized conjugate residual (GCR) algorithm to solve the corresponding linear systems and further propose algorithm optimizations for large-scale parallelism in two aspects: (i) reduction of iteration number for solution and (ii) performance enhancement of each GCR iteration. The reduction of iteration number is achieved by advanced preconditioning techniques, combining block incomplete LU factorization-k preconditioner over 7-diagonals of the coefficient matrix with the restricted additive Schwarz method effectively . The improvement for GCR iteration is to reduce the global communication operations by refactoring the GCR algorithm, which decreases the communication overhead over large number of cores. Performance evaluation on the Tianhe-1A system shows that the new preconditioning techniques reduce almost one-third iterations for solving the linear systems, the proposed methods can obtain 25% performance improvement on average compared with the original version of Helmholtz solver in GRAPES, and the speedup with our algorithms can reach 10 using 2048 cores compared with 256 cores. Copyright © 2013 John Wiley & Sons, Ltd.
منابع مشابه
GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures
Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published...
متن کاملDelft University of Technology Report 11-01 a Scalable Helmholtz Solver Combining the Shifted Laplace Preconditioner with Multigrid Deflation
A Helmholtz solver whose convergence is parameter independent can be obtained by combining the shifted Laplace preconditioner with multigrid deflation. To proof this claim, we develop a Fourier analysis of a two-level variant of the algorithm proposed in [1]. In this algorithm those eigenvalues that prevent the shifted Laplace preconditioner from being scalable are removed by deflation using mu...
متن کاملEfficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver
The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-...
متن کاملMixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver
In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...
متن کاملPSPIKE: A Parallel Hybrid Sparse Linear System Solver
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse linear system solvers. These solvers must optimize parallel performance, processor (serial) performance, as well as memory requirements, while being robust across broad classes of applications and systems. In this paper, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 25 شماره
صفحات -
تاریخ انتشار 2013